Compiling queries for high-performance computing
نویسندگان
چکیده
Data-intensive applications motivate the integration of highproductivity query languages with high-performance computing runtimes. We present a technique Compiled parallel pipelines (CPP) for compiling relational query plans to programs suitable for high-performance computing platforms. Rather than compose a sequential query compiler with a high-performance communication library like MPI, we take a holistic approach that leverages the capabilities of parallel languages. For each pipeline in the query plan, CPP generates a parallel partitioned global address space (PGAS) program. This approach affords modular design, and it allows the compiler to reason about whole pipelines that include parallelism and communication. Using PGAS to efficiently execute queries requires designing efficient shared data structures, generating code that avoids extra messages, and mitigating the overhead of an execution model based on fine-grained tasks. We implement our technique as a system called RADISH. Our evaluation shows that CPP is 5.5× faster than compiled iterators on TPC-H queries. To show that RADISH is a practical system for in-memory analytics, we also compare the performance of RADISH on TPC-H with the MPP system DBX and find it to be competitive. Our work takes important first steps integrating query processing and distributed HPC.
منابع مشابه
SESOS: A Verifiable Searchable Outsourcing Scheme for Ordered Structured Data in Cloud Computing
While cloud computing is growing at a remarkable speed, privacy issues are far from being solved. One way to diminish privacy concerns is to store data on the cloud in encrypted form. However, encryption often hinders useful computation cloud services. A theoretical approach is to employ the so-called fully homomorphic encryption, yet the overhead is so high that it is not considered a viable s...
متن کاملDBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases
We present DBToaster, a novel query compilation framework for producing high performance compiled query executors that incrementally and continuously answer standing aggregate queries using in-memory views. DBToaster targets applications that require efficient main-memory processing of standing queries (views) fed by high-volume data streams, recursively compiling view maintenance (VM) queries ...
متن کاملGreen Energy-aware task scheduling using the DVFS technique in Cloud Computing
Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...
متن کاملCompiling Matlab for High Performance Computing via X 10 1 Sable Technical Report
Matlab is a popular dynamic array-based language commonly used by students, scientists and engineers, who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in Matlab, their computations are often very computeintensive and are better suit...
متن کاملFast Query Evaluation with (Lazy) Control Flow Compilation
Learning algorithms such as decision tree learners dynamically generate a huge amount of large queries. Because these queries are executed often, the trade-off between meta-calling and compiling & running them has been in favor of the latter, as compiled code is faster. This paper presents a technique named control flow compilation, which improves the compilation time of the queries by an order...
متن کامل